我们介绍了一个免费的开源的Pytorch库,用于培训具有差异隐私的深度学习模型(在Https://opacus.ai托管)。遮光罩是为了简单,灵活性和速度而设计的。它提供了一个简单和用户友好的API,并使机器学习从业者能够通过向其代码添加两条线来制作培训管道私有。它支持各种各样的层,包括多主题注意,卷积,LSTM和嵌入框,它还提供了支持其他用户定义的图层的手段。遮光罩计算的每个样本梯度批量,与传统的“微批量”方法相比,提供更好的效率。在本文中,我们介绍了opacus,详细介绍了推动其实现和独特功能的原则,并将其对ML中差别隐私的其他框架进行比较。
translated by 谷歌翻译
In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset.
translated by 谷歌翻译
Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized methods, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this position paper, we push for redirection. We argue that the AI community needs to consider all the consequences of choosing certain formulations of these pillars -- not just the technical incompatibilities, but also the effects within the context of deployment. We point towards sociotechnical research for frameworks for the latter, but push for broader efforts into implementing these in practice.
translated by 谷歌翻译
Developing and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition systems have been sought out to help solve this problem. To that end, this report establishes a baseline accuracy of a commercial contactless palmprint recognition system that may be deployed for recognizing children in the age group of one to five years old. On a database of contactless palmprint images of one thousand unique palms from 500 children, we establish SOTA authentication accuracy of 90.85% @ FAR of 0.01%, rank-1 identification accuracy of 99.0% (closed set), and FPIR=0.01 @ FNIR=0.3 for open-set identification using PalmMobile SDK from Armatura.
translated by 谷歌翻译
Reliable uncertainty quantification in deep neural networks is very crucial in safety-critical applications such as automated driving for trustworthy and informed decision-making. Assessing the quality of uncertainty estimates is challenging as ground truth for uncertainty estimates is not available. Ideally, in a well-calibrated model, uncertainty estimates should perfectly correlate with model error. We propose a novel error aligned uncertainty optimization method and introduce a trainable loss function to guide the models to yield good quality uncertainty estimates aligning with the model error. Our approach targets continuous structured prediction and regression tasks, and is evaluated on multiple datasets including a large-scale vehicle motion prediction task involving real-world distributional shifts. We demonstrate that our method improves average displacement error by 1.69% and 4.69%, and the uncertainty correlation with model error by 17.22% and 19.13% as quantified by Pearson correlation coefficient on two state-of-the-art baselines.
translated by 谷歌翻译
Changes in real-world dynamic processes are often described in terms of differences in energies $\textbf{E}(\underline{\alpha})$ of a set of spectral-bands $\underline{\alpha}$. Given continuous spectra of two classes $A$ and $B$, or in general, two stochastic processes $S^{(A)}(f)$ and $S^{(B)}(f)$, $f \in \mathbb{R}^+$, we address the ubiquitous problem of identifying a subset of intervals of $f$ called spectral-bands $\underline{\alpha} \subset \mathbb{R}^+$ such that the energies $\textbf{E}(\underline{\alpha})$ of these bands can optimally discriminate between the two classes. We introduce EGO-MDA, an unsupervised method to identify optimal spectral-bands $\underline{\alpha}^*$ for given samples of spectra from two classes. EGO-MDA employs a statistical approach that iteratively minimizes an adjusted multinomial log-likelihood (deviance) criterion $\mathcal{D}(\underline{\alpha},\mathcal{M})$. Here, Mixture Discriminant Analysis (MDA) aims to derive MLE of two GMM distribution parameters, i.e., $\mathcal{M}^* = \underset{\mathcal{M}}{\rm argmin}~\mathcal{D}(\underline{\alpha}, \mathcal{M})$ and identify a classifier that optimally discriminates between two classes for a given spectral representation. The Efficient Global Optimization (EGO) finds the spectral-bands $\underline{\alpha}^* = \underset{\underline{\alpha}}{\rm argmin}~\mathcal{D}(\underline{\alpha},\mathcal{M})$ for given GMM parameters $\mathcal{M}$. For pathological cases of low separation between mixtures and model misspecification, we discuss the effect of the sample size and the number of iterations on the estimates of parameters $\mathcal{M}$ and therefore the classifier performance. A case study on a synthetic data set is provided. In an engineering application of optimal spectral-banding for anomaly tracking, EGO-MDA achieved at least 70% improvement in the median deviance relative to other methods tested.
translated by 谷歌翻译
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.
translated by 谷歌翻译
Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the Zooniverse.org citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion.
translated by 谷歌翻译
自动化车辆功能最佳接受和舒适性的关键因素是驾驶方式。自动化和驱动程序偏爱的驾驶方式之间的不匹配可以使用户更频繁地接管甚至禁用自动化功能。这项工作建议用多模式信号识别用户驾驶样式偏好,因此该车辆可以以连续自动的方式匹配用户偏好。我们对36名参与者进行了驾驶模拟器研究,并收集了广泛的多模式数据,包括行为,生理和情境数据。这包括眼目光,转向抓地力,驾驶演习,制动和节气门踏板输入以及距踏板的脚距离,瞳孔直径,电流皮肤反应,心率和情境驱动驱动环境。然后,我们建立了机器学习模型来识别首选的驾驶方式,并确认所有模式对于识别用户偏好都很重要。这项工作为自动车辆的隐性自适应驾驶风格铺平了道路。
translated by 谷歌翻译
制作对抗性攻击的大多数方法都集中在具有单个主体对象的场景上(例如,来自Imagenet的图像)。另一方面,自然场景包括多个在语义上相关的主要对象。因此,探索设计攻击策略至关重要,这些攻击策略超出了在单对象场景上学习或攻击单对象受害者分类器。由于其固有的属性将扰动向未知模型的强大可传递性强,因此本文介绍了使用生成模型对多对象场景的对抗性攻击的第一种方法。为了代表输入场景中不同对象之间的关系,我们利用开源的预训练的视觉语言模型剪辑(对比语言图像 - 预训练),并动机利用语言中的编码语义来利用编码的语义空间与视觉空间一起。我们称这种攻击方法生成对抗性多对象场景攻击(GAMA)。 GAMA展示了剪辑模型作为攻击者的工具的实用性,以训练可强大的扰动发电机为多对象场景。使用联合图像文本功能来训练发电机,我们表明GAMA可以在各种攻击环境中制作有效的可转移扰动,以欺骗受害者分类器。例如,GAMA触发的错误分类比在黑框设置中的最新生成方法高出约16%,在黑框设置中,分类器体系结构和攻击者的数据分布都与受害者不同。我们的代码将很快公开提供。
translated by 谷歌翻译